Introduction
Data processors process data into the target format. It does three things:
- Read from a data source, no matter if it is an internal or external source
- Transform the data step by step. A step can be, for example, filtering, aggregating, or sorting
- The output will be stored in a new data source, with the type "Derived Source from Another Source". Data processors can process data of any size. However, depending on the pricing plan you have chosen, there might be a different limit on the types and sizes of the data to process. For details, please check the pricing plan. You can perform all data processor management (create, modify, remove, list) in the configuration panel.
Building a data processing pipeline
Using the pipeline editor
The last step when you create or modify a data processor is to edit the data pipeline. The UI has four major sections. On the left-hand side, there is an overview of all the pipeline steps, where you can add/modify/remove/reorder steps. When you click on the "pencil" icon on a step, you will enter the detailed edit mode, shown in the three sections on the right. Among the three sections on the right, the top section is a Detailed Step Editor. For example, if you have chosen to edit a "filter" step, then in this section, you will be able to specify the filter conditions. Below the detailed step editor, you will find the input/output preview of this step. Using these two previews, you could easily see how this step works out on real data.
Pipeline step reference
Filter Step
A filter step filters source data by conditions like "equal to" and "greater than". It consists of one or more filtering conditions. Each condition has a field, an operator, and a value. For example, "first_name"(field), "equals to"(operator), "John"(value). You can manage the filtering conditions in the Detailed Step Editor. Multiple filtering conditions can be connected by either "AND" or "OR". For example, "first_name equals to John" and "age greater than 18".
List of operators
- greater than
- less than
- greater/equal to
- less/equal to
- equal to
- not equal to
- in
- Returns if the data value is in a list of values. The list of values needs to be comma "," separated. For example, apple,pear,orange.
- not in
- Returns if the data value is NOT in a list of values. The list of values needs to be a "," separated. For example, apple,pear,orange.
- is number
- Returns if the data value is a number. This will filter out empty values too.
- is text
- Returns if the data value is text. This will filter out empty values too.
Filter Value
A filter value can be
- Text
- Denoted in the Detailed Step Editor" as a grey box with "ABC"
- Number
- Denoted in the Detailed Step Editor" as a grey box with "123"
- Comma(",") separated list
- Only used for operators "in" and "not in"
Change Type Step
A Change Type Step changes one or more fields into the target type. Currently, two types are supported:
- Text
- Number
Select Step
A Select Step selects a subset of columns while removing the rest columns. You simply need to click on"Add Field" in the **Detailed Pipeline Editor".
Sort Step
A Sort Step sorts the data entries by a subset of fields and their orderings (ascending, descending). You simply need to click on"Add Field" in the **Detailed Pipeline Editor".
Pivot Step
A Pivot Step groups data entries by Group By Fields, and aggregate by Aggregation Fields. An aggregation field setup consists of a field and an aggregation operator.
Supported aggregation operators:
- Sum
- Works only on numbers
- Maximum
- Minimum
- Average
- Works only on numbers
- Join
- Concatenate multiple texts into one. Works only on text.
Map Step
A Map Step updates a field value by various functions (e.g. arithmetic plus, minus, divide). A Map Step can have multiple mappings. Each mapping has a target field and an expression. For example, set "Price"(target field) to "Price + 10". An expression can contain fields, functions and constants. A function must start with a dollar sign("$"). A constant can be of text or number. A text constant must be quoted using double quotes (""). You can easily write expressions with the help of the add icon (circle with plus).
Example Expressions
- $replace(firstname, " ", "")
- This expression replace all spaces in first_name field with underscores
- $concat("$", $text($round(Price, 2)))
- This expression first round the Price field to 2 decimals, then turns it into a text. Finally, a dollar sign ($) is added in front of the rounded Price.
Supported Functions
- $length(TEXT_EXP)
- Returns the total count of characters in a TEXT_EXP. Only for text fields.
- $subtext(TEXT_EXP, START, [LENGTH])
- Returns the subtext in a TEXT_EXP. The subtext starts from the start_position. If LENGTH is specified, then the subtext is of length LENGTH. Otherwise, the subtext contains characters from the START till the end.
- $concat(TEXT_EXP, TEXT1, [TEXT2])
- Returns a concatenated text combining TEXT_EXP, TEXT1, and optionally TEXT2
- $upper(TEXT_EXP)
- Returns TEXT_EXP in upper case
- $lower(TEXT_EXP)
- Returns TEXT_EXP in lowercase
- $trim(TEXT_EXP)
- Normalizes and trims all whitespace characters in TEXT_EXP by applying the following steps:
1. All tabs, carriage returns, and line feeds are replaced with spaces.
2. Contiguous sequences of spaces are reduced to a single space.
3. Trailing and leading spaces are removed.
- Normalizes and trims all whitespace characters in TEXT_EXP by applying the following steps:
- $contains(TEXT_EXP, TARGET)
- Returns if TEXT_EXP contains subtext TARGET
- $replace(TEXT_EXP, SOURCE, TARGET)
- Replaces the subtext SOURCE with TARGET in TEXT_EXP
- $startswith(TEXT_EXP, TARGET)
- Returns if TEXT_EXP starts with text TARGET
- $endswith(TEXT_EXP, TARGET)
- Returns if TEXT_EXP ends with text TARGET
- $number(TEXT_EXP)
- Cast a TEXT_EXP to number type
- $text(NUMBER_EXP)
- Cast a NUMBER_EXP to text type
- $abs(NUMBER_EXP)
- Returns the absolute value of NUMBER_EXP
- $floor(NUMBER_EXP)
- Returns the value of NUMBER_EXP rounded down to the nearest integer that is smaller or equal to the number
- $ceil(NUMBER_EXP)
- Returns the value of NUMBER_EXP rounded up to the nearest integer that is greater than or equal to the number.
- $round(NUMBER_EXP, [PRECISION])
- Returns the value of the NUMBER_EXP parameter rounded to the number of decimal places specified by the optional PRECISION parameter.
- $power(NUMBER_EXP, EXPONENT)
- Returns the value of NUMBER_EXP raised to the power of EXPONENT
- $sqrt(NUMBER_EXP)
- Returns the square root of the value of the NUMBER_EXP parameter.
- $fromMillis(NUMBER_EXP)
- Convert the NUMBER_EXP representing milliseconds since the Unix Epoch (1 January, 1970 UTC) to a formatted string representation of the timestamp.
Rename Step
A Rename Step updates a field name.
Rebase Step
A Rename Step only works on hierarchical data (JSON). It rebases the root of the data to the path you select. For example, we have a data source like:
[
{
customer:{name: "John"},
account_balance:100
},
{
customer:{name: "Mary"},
account_balance:20
},
]
When we rebase using path "customer", we will have the output like:
[
{name: "John"},
{name: "Mary"},
]
Zip Step
A Zip Step only works on hierarchical data (JSON). It zips and merges two arrays into one object, element by element. For example, we have source data like:
{
open_price: [
1,
2,
3,
],
close_price: [
4,
5,
6
]
}
After zipping, we have the output:
[
{
open_price: 1,
close_price: 4
},
{
open_price: 2,
close_price: 5
},
{
open_price: 3,
close_price: 6
}
]
Processing Big Data
You can build a data pipeline to process data of any size. For data processing under 128MB (on a single data source), it is free. However, to process data larger than 128MB, you need to use the big data processing function and you need an upgraded subscription. Please see the pricing plan for more details.
The only difference between small data and big data processing is, big data processing can take longer time. While the steps are more or less the same, we cannot guarantee that you get the processor output immediately. It depends on the size of the data source and the complexity of the data pipeline.
That is why every big data processor is of type "Large" and has its "Run History". You can see the "Run History" on the configuration panel under the "Data Processors" tab.
Data Processor Management
Creating a data processor
Go to the configuration panel -> click on "Data Processors" on the left menu -> click on "Add Processor" in the top right corner.
Modifying a data processor
Go to the configuration panel -> click on "Data Processors" on the left menu -> Find the data processor row you would like to modify -> Click on the gear icon to modify
Deleting a data processor
Go to the configuration panel -> click on "Data Processors" on the left menu -> Find the data processor row you would like to delete -> Click on the cross icon to delete
Report Bugs, Request New Features, and Win $50 Every Month
We value your precious feedback. Please contact us when you find a bug or would like to request a new feature. (In the main panel, click on the bob logo on the top left corner, then in the Dropdown menu click on “contact for bugs or new features”). Every month we will select an “opinion leader” and reward him/her with $50 in cash.